Search CORE

24 research outputs found

Towards an Achievable Performance for the Loop Nests

Author: A Darte
AH Ashouri
AW Lim
DA Padua
G Fursin
Georgios Tournavitis
J Demšar
K Kennedy
K Stock
MJ Wolfe
Padua
R Allen
R Cammarota
T Grosser
U Bondhugula
W Li
Zhangxiaowen Gong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC 2018

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs

Author: Ao Y.
Bondhugula U.
Bondhugula Uday
Chi Y.
de Fine Licht Johannes
Grosser Tobias
Grosser Tobias
Grosser Tobias
Hagedorn Bastian
Holewinski Justin
Irigoin F.
Kamil Shoaib
Konstantinidis E.
Krishnamoorthy Sriram
Maruyama Naoya
Meng Jiayuan
Muranushi Takayuki
Nguyen A.
Prajapati Nirmal
Ravishankar Mahesh
Rawat P. S.
Rawat Prashant
Rawat Prashant Singh
Rawat Prashant Singh
Rawat Prashant Singh
Rossinelli Diego
Shimokawabe Takashi
Shimokawabe Takashi
Tang W. T.
Tang Yuan
Verdoolaege Sven
Verdoolaege Sven
Verdoolaege Sven
Williams Samuel
Wolfe M.
Zohouri H. R.
Zohouri Hamid Reza
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/02/2020
Field of study

Stencil computation is one of the most widely-used compute patterns in high performance computing applications. Spatial and temporal blocking have been proposed to overcome the memory-bound nature of this type of computation by moving memory pressure from external memory to on-chip memory on GPUs. However, correctly implementing those optimizations while considering the complexity of the architecture and memory hierarchy of GPUs to achieve high performance is difficult. We propose AN5D, an automated stencil framework which is capable of automatically transforming and optimizing stencil patterns in a given C source code, and generating corresponding CUDA code. Parameter tuning in our framework is guided by our performance model. Our novel optimization strategy reduces shared memory and register pressure in comparison to existing implementations, allowing performance scaling up to a temporal blocking degree of 10. We achieve the highest performance reported so far for all evaluated stencil benchmarks on the state-of-the-art Tesla V100 GPU

arXiv.org e-Print Archive

Crossref

A Compiler-Assisted OpenMP Migration Method Based on Automatic Parallelizing Information

Author: C. Dave
C. Liao
D. Mustafa
U. Bondhugula
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Automatic Privatization for Parallel Execution of Loops

Author: A. Beletska
P. Feautrier
U. Bondhugula
W. Pugh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language

Author: M. Wolfe
N. Ellmenreich
P. Feautrier
S. Abu-Mahmeed
U. Bondhugula
Publication venue: SPRINGER-VERLAG BERLIN
Publication date: 01/01/2013
Field of study

Polyhedral techniques for program transformation are now used in several proprietary and open source compilers. However, most of the research on polyhedral compilation has focused on imperative languages such as C, where the computation is specified in terms of statements with zero or more nested loops and other control structures around them. Graphical dataflow languages, where there is no notion of statements or a schedule specifying their relative execution order, have so far not been studied using a powerful transformation or optimization approach. The execution semantics and referential transparency of dataflow languages impose a different set of challenges. In this paper, we attempt to bridge this gap by presenting techniques that can be used to extract polyhedral representation from dataflow programs and to synthesize them from their equivalent polyhedral representation. We then describe PolyGLoT, a framework for automatic transformation of dataflow programs which we built using our techniques and other popular research tools such as Clan and Pluto. For the purpose of experimental evaluation, we used our tools to compile LabVIEW, one of the most widely used dataflow programming languages. Results show that dataflow programs transformed using our framework are able to outperform those compiled otherwise by up to a factor of seventeen, with a mean speed-up of 2.30x while running on an 8-core Intel system

Crossref

Open Access Repository of IISc Research Publications